feat: add partial load rule types#19374
Conversation
changes: * adds `PartialLoadRule` abstract class to capture load rules that should partially load a segment on some tier * adds `IntervalPartialLoadRule, `ForeverPartialLoadRule`, and `PeriodPartialLoadRule` implementations to mirror non-partial load rules * adds `PartialLoadMatcher` interface to match and select what to partial load * adds `ExactProjectionPartialLoadMatcher` and `WildcardProjectionPartialLoadMatcher` to do partial loading of projections * adds `CannotMatchBehavior` enum to describe behavior of `PartialLoadRule` when `PartialLoadMatcher` is unable to match a segment * since partial loading is not available yet, partial rules function as regular load rules until follow-up work * tests
| this.matcher = matcher; | ||
| this.onCannotMatch = Configs.valueOrDefault(onCannotMatch, CannotMatchBehavior.FULL_LOAD); | ||
| } | ||
|
|
There was a problem hiding this comment.
P1 Partial rule fall-through is invisible to handoff checks
appliesTo returns false for a partial rule whose matcher cannot match and whose onCannotMatch is FALL_THROUGH. Handoff/readiness code that only asks whether the rule applies to the segment interval can therefore miss that this rule intentionally falls through to a later full-load rule, and can conclude that no load rule covers the segment. A realtime task for a segment without matching projections can then wait forever even though the cascade would load it through the following rule. The rule-selection path used by handoff needs to evaluate the same matcher/fall-through semantics as the coordinator cascade, not just interval applicability.
| * version that introduces a new behavior falls back to the constructor's default ({@link #FULL_LOAD}) rather than | ||
| * failing to parse the rule. | ||
| */ | ||
| public enum CannotMatchBehavior |
There was a problem hiding this comment.
IMO it'd be nicer to define the enums like FALL_THROUGH("fallThrough") and have the "fallThrough" be how they serialize in JSON. SImilar to JoinAlgorithm and CloneQueryMode. Most Druid stuff uses thatStyle rather than THIS_ONE.
| * configuration that drives the decision and the wire format of their corresponding {@link LoadSpec} wrapper, so the | ||
| * rule layer stays scheme-agnostic. | ||
| */ | ||
| @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") |
There was a problem hiding this comment.
If we add a new one of these then the PartialLoadRule will fail to deserialize. Just like CannotMatchBehavior, maybe this should be lenient too?
The way to make things lenient is to add a dummy implementation as a defaultImpl, such as QueryCounterSnapshot for example. Servers will then at least know that they're dealing with an unrecognized PartialLoadMatcher.
| return patterns; | ||
| } | ||
|
|
||
| @JsonProperty |
There was a problem hiding this comment.
@JsonInclude(NON_EMPTY) would be nice.
| "type", WIRE_TYPE, | ||
| "delegate", baseLoadSpec, | ||
| "projections", resolved, | ||
| "ruleFingerprint", fingerprint |
There was a problem hiding this comment.
It isn't really a rule fingerprint, it's more of a projection names fingerprint. Maybe just call it fingerprint since projectionNamesFingerprint is a mouthful.
| * Base for {@link PartialLoadMatcher} implementations that decide which of a segment's V10 projections to load. | ||
| * Subclasses supply the resolution policy via {@link #resolveProjectionNames(DataSegment)}; this base handles | ||
| * fingerprint computation and wraps the result into the {@code partialProjection} load-spec wire form consumed | ||
| * by the historical-side {@code PartialProjectionLoadSpec} (which does not exist yet, supplied in future work). |
There was a problem hiding this comment.
IMO best not to mention things that don't exist yet. Instead, the next PR should update this text. It will need to update it anyway to remove the "supplied in future work" comment.
| */ | ||
| public abstract class ProjectionPartialLoadMatcher implements PartialLoadMatcher | ||
| { | ||
| static final String WIRE_TYPE = "partialProjection"; |
There was a problem hiding this comment.
LOAD_SPEC_TYPE? Sounds clearer.
| { | ||
| final Hasher hasher = Hashing.sha256().newHasher(); | ||
| for (String name : sortedDedupedNames) { | ||
| hasher.putString(name, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
The javadoc for putString says that putUnencodedChars is preferred.
There was a problem hiding this comment.
Oh, I looked into this but didn't change it yet because i sort of forgot about it. FWIW we are using putString in a lot more places in code than putUnecodedChars; the javadocs do say that it is fine so long as only Java is the language involved, so probably fine to change, especially since i have a version is included in the fingerprint so i guess we could just bump version if we ever want to support non-java processes being involved in this stuff in any manner...
| private static boolean matchesAny(String name, List<String> patterns) | ||
| { | ||
| for (String pattern : patterns) { | ||
| if (FilenameUtils.wildcardMatch(name, pattern)) { |
There was a problem hiding this comment.
How does this treat \*? I wonder if it escapes the * or if it looks for a literal \ followed by anything.
It's probaly not likely that a projection name will include * or ?, but it would be nice to support escaping anyway, in case they do.
There was a problem hiding this comment.
ah, it did not handle escapes... had claude help generate a version that supports escaping which seems to not add too much complexity
| @Override | ||
| public boolean appliesTo(Interval interval, DateTime referenceTimestamp) | ||
| { | ||
| return Rules.eligibleForLoad(period, interval, referenceTimestamp, includeFuture); |
There was a problem hiding this comment.
I wonder if anything weird will happen because appliesTo(segment.getInterval(), refTs) doesn't always match appliesTo(segment, refTs) like it does with the other rules. This overload is used in two places:
DataSourcesResource#isHandOffComplete, where it's used to short-circuit handoff checking for segments that no load rule applies to. I suppose the risk here is that we're in a situation where really no rule matches (because the segment doesn't have any desired projections for a partial-rule, and there's no other rule to fall back to), but the handoff checker thinks the partial-rule would match, and so handoff times out. This seems unlikely to happen— probably in a real cluster there would be another rule to fall back on— but is worth a comment somewhere acknowledging it's a risk. Possibly right here.TieredBrokerHostSelector, where it's used to find the rule that will drive which Broker handles a request whendruid.router.tierToBrokerMapis set. This is probably fine: in some configurations of load rules it might change which broker gets a query, but probably the behavior is defensible given that the routing isn't projection-aware.
There was a problem hiding this comment.
i switched isHandOffComplete to use the metadata snapshot to use the version of appliesTo that takes a segment so that it will work properly.
I didn't make any changes for TieredBrokerHostSelector, since i'm not really sure how to improve that, i guess we can document it when we document the rest of this stuff.
| SegmentId.of(dataSourceName, theInterval, version, partitionNumber) | ||
| ); | ||
| // Segment isn't published in metadata; it will never be handed off. | ||
| if (segment == null) { |
There was a problem hiding this comment.
[P1] Do not treat a missing metadata snapshot segment as handed off
getRecentDataSourcesSnapshot() may return a snapshot up to the segment poll delay old, so a just-published realtime segment can be absent even though it still needs handoff. Returning true here tells the task the segment will never be handed off and can let it stop waiting before any historical has loaded it. This should return false or force a fresh metadata poll before concluding the segment is not published.
| final SegmentId segmentId = SegmentId.of(dataSourceName, theInterval, version, partitionNumber); | ||
| DataSegment segment = lookupSegment(segmentsMetadataManager.getRecentDataSourcesSnapshot(), segmentId); | ||
| if (segment == null) { | ||
| segment = lookupSegment(segmentsMetadataManager.forceUpdateDataSourcesSnapshot(), segmentId); |
There was a problem hiding this comment.
[P2] Avoid blocking handoff checks on every metadata miss
When the recent snapshot does not contain the segment, this endpoint now calls forceUpdateDataSourcesSnapshot() before returning true. In the v2 incremental metadata manager that call waits for the next cache sync with a timeout of 2 * druid.manager.segments.pollDuration, so a genuinely absent or removed segment can make each handoffComplete request block for up to the poll interval before returning the same true result. Realtime handoff polling is synchronous and iterates callbacks sequentially, so one missing segment can delay completion checks for others. Consider avoiding the forced refresh on every miss, using a cheaper targeted check, or bounding this path so absent segments still complete promptly.
There was a problem hiding this comment.
not sure how often this would happen in practice, we don't currently have easy access direct to metadata store to try to get just a single segment without some plumbing changes i think, and i don't think we should do this anyway because we don't want a bunch of handoff calls hammering the metadata store trying to fetch single segments.
I guess we could return like a 404 or something and have the client retry some number of times in the case of a missing segment if we are worried about this, though that seems like it would have its own complexity since we would need to track retries on that on the client side, so i'd be in favor of not doing that maybe.
There was a problem hiding this comment.
It's better to not have to worry about this if partial rules aren't being used. How about:
- add
LoadRule#isIntervalBasedthat returns true if the twoappliesTomethods are equivalent - check that here; go down the metadata path if false; use the interval only if true.
capistrant
left a comment
There was a problem hiding this comment.
none of my comments are blocking, will flip to a +1 if you want to pass on them
There was a problem hiding this comment.
claude observation - told claude to look at the surface area covered by the different tests since it is hard to keep straight just scrolling and it offered a few ideas for further coverage. just food for thought if you want to expand, I don't see it as blocking
- Mid-string and suffix wildcards. Every match test uses * or ? at the end (user_, p?, session_d). A pattern like daily or user_v2 would catch a class of bugs that the current tests can't
(e.g. if someone someday "optimizes" the prefix case). Cheap to add, real signal. - No-match-returns-null when the pattern simply doesn't hit anything. testExcludeAllMatchedReturnsNull covers the all-excluded path, and testReturnsNullForProjectionAgnosticSegment covers the
no-projections path. There's no test for "segment has projections, none match patterns" — same observable behavior, but a different code path through resolveProjectionNames. - Multiple exclude patterns. Every exclude test uses a single-element excludePatterns list. List.of("user_", "session_temp_") against a segment with names hitting both would pin down that the
exclude list is OR'd, not AND'd. - Case sensitivity. Implicit in the implementation but never asserted. A one-liner — User_* vs user_daily — locks the contract so a future Pattern.CASE_INSENSITIVE change would break a test instead
of silently shifting behavior.
There was a problem hiding this comment.
i told my claude what your claude said and it made more tests 😛
|
|
||
| /** | ||
| * Selects projections whose names match any of the configured glob patterns, minus any names matching an entry in | ||
| * {@code excludePatterns}. Supported glob metacharacters: |
There was a problem hiding this comment.
any consideration to reserving [ and ] in case of desire for supporting character classes in the future?
There was a problem hiding this comment.
I didn't really consider this, originally i was using FilenameUtils.wildcardMatch which is the underlying way that LocalInputSource matches file patterns in its spec (via WildcardFileFilter), which doesn't support those patterns either (and also didn't support escaping which is why have switched to this).
I think my preference would be to just add a new implementation of a PartialLoadMatcher if we need more sophisticated matching in the future, but I am open to discussion if anyone feels otherwise.
There was a problem hiding this comment.
I think that is a fair path forward. no sense over complicating things for some maybe useful future use case when a separate path to supporting it is known.
| { | ||
| final Hasher hasher = Hashing.sha256().newHasher(); | ||
| for (String name : sortedDedupedNames) { | ||
| hasher.putString(name, StandardCharsets.UTF_8); |
gianm
left a comment
There was a problem hiding this comment.
Looks ok to me once this is addressed: #19374 (comment)
Description
Adds a new family of retention rules,
loadPartialByPeriod,loadPartialByInterval,loadPartialForever, laying the groundwork for partial loading of V10 segment projections on historicals. This PR includes the rule classes, matcher abstraction, and cascade semantics only; the coordinator plumbing, wire format, and historical-side partial load path are deferred to follow-up work.changes:
PartialLoadRuleabstract class to capture load rules that should partially load a segment on some tierIntervalPartialLoadRule,ForeverPartialLoadRule, andPeriodPartialLoadRule` implementations to mirror non-partial load rulesPartialLoadMatcherinterface to match and select what to partial loadExactProjectionPartialLoadMatcherandWildcardProjectionPartialLoadMatcherto do partial loading of projectionsCannotMatchBehaviorenum to describe behavior ofPartialLoadRulewhenPartialLoadMatcheris unable to match a segment